Pandas: Remove non

您所在的位置:网站首页 numeric index Pandas: Remove non

Pandas: Remove non

2024-05-10 02:34| 来源: 网络整理| 查看: 265

# Table of ContentsPandas: Remove non-numeric rows in a DataFrame columnPandas: Remove non-numeric rows in a column in DataFrame using apply()Pandas: Remove non-numeric rows in a column in DataFrame using isnumeric()# Pandas: Remove non-numeric rows in a DataFrame column

To remove the non-numeric rows in a column in a Pandas DataFrame:

Use the pandas.to_numeric() method to convert the values in the column to numeric.Set the errors argument to "coerce", so non-numeric values get set to NaN.Remove the NaN values using DataFrame.notnull().main.pyCopied!import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) print(df) print('-' * 50) only_numeric = df[ pd.to_numeric(df['salary'], errors='coerce').notnull() ] print(only_numeric) The code for this article is available on GitHub

Running the code sample produces the following output.

shellCopied! first_name salary experience 0 Alice 175.1 10 1 Bobby b 15 2 Carl 190.3 20 3 Dan d 25 -------------------------------------------------- first_name salary experience 0 Alice 175.1 10 2 Carl 190.3 20

The pandas.to_numeric() method converts the given argument to a numeric type.

We set the errors keyword argument to "coerce" so that values that cannot be parsed, get set as NaN.

main.pyCopied!import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) # 0 175.1 # 1 NaN # 2 190.3 # 3 NaN # Name: salary, dtype: float64 print(pd.to_numeric(df['salary'], errors='coerce')) The code for this article is available on GitHub

Notice that the 2 values in the "salary" column that cannot get converted to numeric values are set as NaN after calling to_numeric() are set to NaN.

The last step is to use the DataFrame.notnull() method to get a mask of boolean values that indicates whether an element is not an NA value.

main.pyCopied!import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) # 0 True # 1 False # 2 True # 3 False # Name: salary, dtype: bool print(pd.to_numeric(df['salary'], errors='coerce').notnull())

The method returns True if the value is not NA and False otherwise.

You can then use the boolean mask to select the numeric rows.

main.pyCopied!import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) only_numeric = df[ pd.to_numeric(df['salary'], errors='coerce').notnull() ] # first_name salary experience # 0 Alice 175.1 10 # 2 Carl 190.3 20 print(only_numeric) # Pandas: Remove non-numeric rows in a column in DataFrame using apply()

You can also use the DataFrame.apply() method to remove the non-numeric rows in a column.

First, make sure that you have the numpy module installed.

shellCopied!pip install numpy # or with pip3 pip3 install numpy

Now, import the module and use the isinstance function with DataFrame.apply().

main.pyCopied!import pandas as pd import numpy as np df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 'b', 190.3, 'd'], 'experience': [10, 15, 20, 25] }) only_numeric = df[df['salary'].apply( lambda x: isinstance( x, (int, np.int64, float, np.float64))) ] # first_name salary experience # 0 Alice 175.1 10 # 2 Carl 190.3 20 print(only_numeric)

The code for this article is available on GitHub

The DataFrame.apply() method applies a function along an axis of the DataFrame.

We used the isinstance() function to check if each value is numeric.

The resulting DataFrame only contains the "salary" rows that store numeric values.

The code sample considers numeric values ones that have a type of:

intnp.int64floatnp.float64

You can adjust this in the call to isinstance() depending on your use case.

# Pandas: Remove non-numeric rows in a column in DataFrame using isnumeric()

If you only consider integer values numeric, you can also use the str.isnumeric method.

main.pyCopied!import pandas as pd df = pd.DataFrame({ 'first_name': ['Alice', 'Bobby', 'Carl', 'Dan'], 'salary': [175.1, 180.5, 190.3, 203.3], 'experience': [10, 'b', 20, 'd'] }) only_numeric = df[df['experience'].astype('str').str.isnumeric()] # first_name salary experience # 0 Alice 175.1 10 # 2 Carl 190.3 20 print(only_numeric)

The code for this article is available on GitHub

We used the DataFrame.astype method to convert the values in the "experience" column to strings.

The str.isnumeric method returns True if all characters in the string are numeric, and there is at least one character, otherwise False is returned.

Note that the str.isnumeric() method returns False for negative numbers (they contain a minus) and for floats (they contain a period).

main.pyCopied!print('10'.isnumeric()) # 👉️ True print('50'.isnumeric()) # 👉️ True print('-100'.isnumeric()) # 👉️ False print('3.14'.isnumeric()) # 👉️ False print('A'.isnumeric()) # 👉️ False

This approach should only be used when you only consider integers to be numeric values.

# Additional Resources

You can learn more about the related topics by checking out the following tutorials:

Pandas: Get Nth row or every Nth row in a DataFramePandas: Select first N or last N columns of DataFramePandas: DataFrame.reset_index() not working [Solved]How to Create a Set from a Series in PandasPandas: Remove non-numeric rows in a DataFrame columnNumPy: Apply a Mask from one Array to another ArrayHow to iterate over the Columns of a NumPy ArrayPandas: Select rows based on a List of IndicesPandas: Find an element's Index in Series [7 Ways]Matplotlib: No artists with labels found to put in legendFirst argument must be an iterable of pandas objects [Fix]ValueError: Index contains duplicate entries, cannot reshapePandas: Setting column names when reading a CSV fileExport a Pandas DataFrame to Excel without the IndexMust have equal len keys and value when setting with iterableTypeError: '(slice(None, None, None), 0)' is an invalid keyERROR: YouTube said: Unable to extract video data [Solved]RuntimeError: CUDA out of memory. Tried to allocate X MiB


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3